Skip to content

Conversation

@Xunzhuo
Copy link
Member

@Xunzhuo Xunzhuo commented Nov 3, 2025

What type of PR is this?

Feature enhancement for intelligent routing system

What this PR does / why we need it:

This PR implements intent-aware LoRA (Low-Rank Adaptation) routing support, enabling the semantic router to automatically select different LoRA adapters based on the classified intent/category of incoming requests.

Key Changes:

  1. Configuration Structure:

    • Added LoRAAdapter struct to define available LoRA adapters per model in model_config
    • Added lora_name field to ModelScore for specifying which LoRA adapter to use
    • LoRA adapters must be pre-defined in model_config before being referenced
  2. Validation:

    • Implemented validateLoRAName() to ensure lora_name references a valid LoRA defined in the model's configuration
    • Provides clear error messages with available LoRA names when validation fails
  3. Model Selection Logic:

    • Updated selectBestModelInternal() to use LoRA name as the final model name when specified
    • Updated GetModelsForCategory() to return LoRA names for proper filtering
    • When a request is classified, the router now returns the LoRA adapter name instead of the base model name
  4. Documentation & Examples:

    • Added comprehensive example configuration: config/intelligent-routing/in-tree/lora_routing_example.yaml
    • Updated documentation in website/docs/overview/categories/configuration.md
    • Updated README to reflect LoRA adapter routing capability

Configuration Example:

# Define LoRA adapters in model_config
model_config:
  "llama2-7b":
    reasoning_family: "llama2"
    loras:
      - name: "technical-lora"
        description: "Optimized for technical questions"
      - name: "medical-lora"
        description: "Specialized for medical domain"

# Reference them in categories
categories:
  - name: "technical"
    model_scores:
      - model: "llama2-7b"
        lora_name: "technical-lora"  # Routes to technical LoRA
        score: 1.0

How It Works:

  1. LoRA adapters are defined in model_config under the base model
  2. Request is classified into a category (e.g., "technical")
  3. Router selects the best ModelScore for that category
  4. Configuration validator ensures lora_name is defined in model's loras list
  5. If lora_name is specified, it replaces the base model name
  6. Request is sent to vLLM with model="technical-lora"
  7. vLLM automatically routes to the appropriate LoRA adapter

Benefits:

  • Domain Expertise: Fine-tuned adapters for specific domains (technical, medical, legal, etc.)
  • Cost Efficiency: Share base model weights across adapters, reducing memory footprint
  • Easy A/B Testing: Compare adapter versions by adjusting scores
  • Flexible Deployment: Add/remove adapters without router restart
  • Configuration Validation: Prevents typos and missing LoRA definitions

Prerequisites:

  • vLLM server must be started with --enable-lora flag
  • LoRA adapters must be registered using --lora-modules parameter
  • LoRA names in config must match those registered with vLLM

Fixes: #545

- Add LoRAAdapter struct to define available LoRA adapters per model
- Add lora_name field to ModelScore for specifying LoRA adapter
- Implement validation to ensure lora_name references defined LoRAs
- Update model selection logic to use LoRA name when specified
- Add comprehensive example configuration and documentation
- Update README to reflect LoRA adapter routing capability

This enables semantic router to route requests to different LoRA
adapters based on classified intent/category, allowing domain-specific
fine-tuned models to be selected automatically.

Fixes: #545
Signed-off-by: bitliu <[email protected]>
@netlify
Copy link

netlify bot commented Nov 3, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit bea9fa9
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/6908be21c3d4b20008f18cc1
😎 Deploy Preview https://deploy-preview-579--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 config

Owners: @rootfs, @Xunzhuo
Files changed:

  • config/intelligent-routing/in-tree/lora_routing.yaml

📁 website

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

  • website/docs/tutorials/intelligent-route/lora-routing.md
  • website/docs/overview/categories/configuration.md
  • website/sidebars.ts

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • README.md

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/classification/classifier.go
  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/config/validator.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Add hybrid-cache.md to the Semantic Cache section in sidebar.

Signed-off-by: bitliu <[email protected]>
Fix broken link from getting-started/quickstart.md to installation/installation.md

Signed-off-by: bitliu <[email protected]>
Signed-off-by: bitliu <[email protected]>
@rootfs rootfs requested a review from Copilot November 3, 2025 14:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for intent-aware LoRA (Low-Rank Adaptation) routing in the Semantic Router, enabling automatic selection of domain-specific LoRA adapters based on classified query intent.

Key changes:

  • Added LoRA adapter configuration support in model definitions with validation
  • Implemented automatic LoRA name substitution in model selection logic
  • Added comprehensive tutorial and configuration documentation

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
website/sidebars.ts Added new tutorial entries for LoRA routing and hybrid cache
website/docs/tutorials/intelligent-route/lora-routing.md New tutorial documenting LoRA routing setup and configuration
website/docs/overview/categories/configuration.md Extended documentation with LoRA adapter configuration details
src/semantic-router/pkg/config/config.go Added LoRA adapter data structures to support LoRA configuration
src/semantic-router/pkg/config/validator.go Added validation logic to ensure LoRA names reference defined adapters
src/semantic-router/pkg/classification/classifier.go Implemented LoRA name substitution in model selection
config/intelligent-routing/in-tree/lora_routing.yaml Added complete example configuration demonstrating LoRA routing
README.md Updated feature description and removed distributed tracing section

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Xunzhuo and others added 2 commits November 3, 2025 22:36
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Xunzhuo <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Xunzhuo <[email protected]>
@rootfs rootfs merged commit 3951728 into main Nov 3, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Intent-Aware LoRA Routing

4 participants